Finding the Number of Clusters in Unlabelled Datasets Using Extended Cluster Count Extraction (ECCE)

نویسندگان

Srinivasulu Asadi

Kumar Reddy

چکیده

Clustering analysis is the task of partitioning a set of objects O = {O1... On} into C self-similar subsets based on available data. In general, clustering of unlabeled data poses three major problems: 1) Assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into C meaningful groups, and 3) Validating the c clusters that are discovered. All clustering algorithms ultimately rely on one or more human inputs, and the most important input is number of clusters (C) to seek. There are many pre and post clustering methods which relieves the user from this choice. These methods ultimately make the choice by thresholding some value in the code. Thus, the choice of c is transferred to the equivalent choice of the hidden threshold that determines C "automatically". In contrast, tendency assessment attempts to estimate c before clustering occurs. Here, we represent the structure of the unlabeled data sets as a Reordered Dissimilarity Image (RDI) where pair wise dissimilarity information about a data set including ‘n’ objects is represented as n x n image. RDI is generated using VAT (Visual Assessment of Cluster tendency), which highlights potential clusters as a set of “dark blocks” along the diagonal of the image, so that number of clusters can be easily estimated using the number of dark blocks across the diagonal. We develop a new method called “Extended Cluster Count Extraction (ECCE) for counting the number of clusters formed along the diagonal of the RDI. General Terms: Data Mining, Image Processing, Artificial Intelligence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative study of Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count Extraction

One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data prior to clustering. In this paper, we implement a new method for determining the number of clusters called Extended Dark Block Extraction (EDBE), which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set. Its basic steps include 1) Generatin...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Finding the Number of Clusters in Unlabeled Datasets using Extended Dark Block Extraction

Clustering analysis is the problem of partitioning a set of objects O = {o1... on} into c self-similar subsets based on available data. In general, clustering of unlabeled data poses three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into c meaningful groups, and 3) validating the c clusters that are discovered. We address the first pr...

متن کامل

ارائه یک الگو ترکیبی داده کاوی با استفاده از قواعد انجمنی و خوشه بندی برای تعیین استراتژی تخفیف دهی، مطالعه موردی شرکت پخش پگاه

Sales promotion is important issue in most of sales and distribution companies and finding the most appropriate strategy for this subject is marketers’ challenge. Discounting (offering) is one of sales promotion strategies. Using the fixed and constant discounting strategy for all customers and on all goods reduces chance for success. Discounting strategy needs a model for providing best ...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Finding the Number of Clusters in Unlabelled Datasets Using Extended Cluster Count Extraction (ECCE)

نویسندگان

چکیده

منابع مشابه

A Comparative study of Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count Extraction

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

Finding the Number of Clusters in Unlabeled Datasets using Extended Dark Block Extraction

ارائه یک الگو ترکیبی داده کاوی با استفاده از قواعد انجمنی و خوشه بندی برای تعیین استراتژی تخفیف دهی، مطالعه موردی شرکت پخش پگاه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

عنوان ژورنال:

اشتراک گذاری